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As wm® add sequence encode a message that 4stsr~ 
mmof the shape ana feBction of & prot^m* "iltfa message 
highly tkgaicrate in that maxsy dififermt s^uoicea cm 
code for pmtcms with maati&llf tfee same struma and 
ssctmty, Ojmpirfeofs of diSbrcat smuoaces with dmikr 
messages can reveal key features of t&? o>cte ^dimjprwc 
TOd^t^diiig of how a protsfe fbldfe a*ad how it per* 
forms to fenctian. 



Tm mmmm n Hjmmmv u.&mu* m the ssr of fro- 
-ins that it encodes. Is is the ability of these proteins to fold 
nruejue ^rce'ditrscmkxj^l smactares that allows thenito 
fercdon ami carry out: the insrxtsctiom of the genome, Thus, 
Gomprehending the rotes that relate arnira> acid sequence to struc- 
ture & fundamental to an understanding of biological processes. 
Because an arnsno add secpienee t^tssra all of the information 
necessary £o determine the smietdrc of a protein (l}> ir^houki be 
possible to predict structure from sequence, and subsequently to 
inier ^tailed aspects offunaion horn rise strucaire. However, both 
pnt>biems are cxiremcly implex, and it seetns likely that either 
will be solved m m exact manner in she near future. It ntey be 
possihk to obtam approximate soludctns fey using expenmentai data 
to simplify the probtetln this artiek^ we terik how an analysis 
of aliowctl arnino add substitutions hi proteins can be mscd to 
reduce tht complexity of sequences and reveal important aspects of 
stn^Tare and function. 



Methods for Smdyteg Tokr&tice to 
So|umce Variation 

There arc two main approaches to sn&fying the tderance of m 
mnmo m& sequence m change. The method relics on the 
process of cvdudon, in which morions are ekher accepted or 
rejected by mmral seiechon. This Toethod has been extremdy 
p;>wedh! for protems stseh as the glooms or mochromes, for which 
sequences horn mmiy dri&rent species are kaiowrs {Z~7). The second 
approach uses genetic rnsthods to mrrod^cc amino add changes at 
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specM c poskbns m a ck^ricd gene and uses seketiorts or screens to 
kkmlfy fccrioml seqoences, This approach has been used ix? great 
adyanr^gc kn proteins drat can be expressed in hacreris or yea«t y 
where the appmprbtc gerwrtk manip^ {3 t (hti)* 

Hie end restilts of both niethods sire lists ofzmve sequences that can 
be compared and analysed to identiry sequence fe^Tyre« that sre 
e.^sentiai for folding or hmction, if a partkisiaf ptopeny of 3 side 
chain, such as charge or skc, is lmp«>r£at3taE s gsveri pasiooit, only 
side dmm that have she rcqnired property will be alk>wcd Con^ 
ver^dy ? if the ctenkal Idcrmty of she side chain i$ anirnportarit, 
then rhany different subsdrimoas wiii be permitted. 

Smdies in which these methods were used have revealed that 
protein arc prpriskigiY tolerant of amino aeid subsdmrkms (2-4, 
ti). Ftsr 'orarnple, in studying the efcts of ^pproximsrefy 1500 
Sfeigk amino acid s\jbstirutkns sr 142 pc^idons in repressor, 
Milfer and co-workers k>md that abtJUt one-^f of ail s^bsdtmbns 
were phenotypkally silent (if). Ax mnm position maj^y difcem,- 
notjeoj^tvatjve substinirtons were allowed. Such residue posidons 
plsy litde or no mlc in structure and mncxinn. At odier \ycmtk m 
subsdtMoons or miW conservative 5nbsritut]ons were flowed, I bese 
reskiiies are the most torxjxtant for hu repressor activity 

Whas roks do invariant and conserved side chains play in 
protemsf StesMues *fe are dkecdy inyoived m protekj fiincuons 
such as bWtiig or caralysis will certainly be among she most 
conserved For example, replacing the Asp In the cadivtk triad of 
trypsin wirh Ami results in a iO-ft^d reduction in activity {12}, A 
shnilar loss of activity oeenrs in a repressor whm a 0NA binding 
t^sidne h changed, feifn Asn to Asp {1$}* To catry onr thesr 
fhnctian, however, these eatalytsc residties and binding reskfees; 
wt be precisely orknred in rbree dimensions, Conset|uentSy, 
mutations in residues that are retired 

stabiUry on abo have <tenark efcts on activtty (18, H-i&u 
Hencej many of the residnes that are conserved m sets of related 
sequences play anwtural roles. 



SubstitiitsoBs at Stirfece and Buried FmMons 

hi their initial comparisons of the gbbin se4|yenceSj Perors and 
eowOTters found that most boned residues require nonpo^ar ^ide 
chains, wherea.s few features of surface side chains arc generall) : 

ftoilks^ ^ 5 f 7 t !?> IS). A^exampk of the sc^^ce Mlerance at 
surface versiis buried sites can be &xn in Fig* 1* whkh shows the 
allowed sijbstitndonsf in X repressor at residue positions that are near 
the dimer interfee but distant fom the DMA bulling surface of the 
protein These subsrimtksfis were idennfied by a toerbnal 
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tjaow »*« stog tie center htis. % e a- 
towed sutetu^ns ih 0 ^ ^ earfi 
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■ "'r * * by an* a oasetR mofasd 
g^^rKnctk>^ safcaios (J). {*) The 
ffattrafwj wircnt aneesstbfe {42) of ? hc w ;y. 
sype s«te chaw in the pjowia dhner («> f^tjvr 
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<-M.«ati so use aj^tai strucmre of the dtmer k aJ«v 
shown so Fur. I At si* : brtriri*««-' .v i t. •> » atop 
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^«^~y &hion (5, 7, m , Rather, IpKte^ 

The Infomiatfouai 



CoastraMte on Gom Seqmmm 



%r ™ n ? Mls W*? * be exwmdy impost** 

£^«*Hr<* ^bi% s wemust: understand the fkteVs that 
waste whether a gsve-s core sequence mil ix- aecentsH,- t» <^ " j 

«He Sih ** hyciropbobfc core of 

^mrmmxi domain of X sprw (20). tht aLptahie ct W 

effect, unlske bydmgen l>:>txling, docs no? desend or 

a«d ^ tedgc, SB€ b a „* wmdd pnAsbJy be 
*«ailt •» ^coBaxacs becstse ^yfeqpa bonds t^iil piri4 ^ 

teftdmg needs cars be satisfed (22i - » 

J^J^^^I!^^ ««« 

W 5^ t? ace ^ afefe - & * ^pressor, she ovwall mtc 
j^tf accep^se^^^ vary b V a b« m 10%, CbrngTS 

as «om fc^T^w falser. For ejeampfc, 

^ ^ ^ *** Ph * ^ ^ ^ »^ * *fe mud core 
S**- sconce cofltexo. i^w vobme 

^ S at tttort,,, b «ned S fe have 3 feo bt*»%S^ 
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With oceassonar e^tiom, the cen: mus C remain hvdsophobk 
« mmtzm a reasonable packing density HowwsE* 
™posed of m cbate ** „o « only <«^«a£3 
w } ' ^ P**^ be framed wimow 

txp.yjnienB. so winds three core residoes >•■»*' x w> . 
nnwaMsimiJitaneoisah' vn!iB»s™,„f T' A . w 5 wrasw VVCfJ -' 

mmfmmm, fa «»m^ ^^ SC(?ffisco that caWdonfy 



2. Atriiao stcid substku- 
tiom tikmvd m die core of X 

seen in tbc ccystgl semaiar^ 

«^tW^s « each pemtio^ 3^ 
sho^^ bdw the wiki-tyss 
side <^mm^ rime subri& 
««ms jikii^iKi ^ tan- 
to% roiifcadi^ one to tor 
residua ^ a mm -by ysing i 
cassette method aad appl^L 
a fenedosmi «dccckm (j^)) , 
Hoe siibstimtkjfia ^ 
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Vi 1 , ^ v « W- S«nc com^ti bsSty was inarmed' 

vote. ^ hvdrop^dtv ia ite^fSS. 



Hie Infbnuatioiiai Imj^xtance of Surface Sites 

oflte aSS^.*? syrfke mdcvwm 

of am chains* sncSudma bvc>odv'i« mt'imUj. a.- , 

«ste iuiPh; V r^- Mrophobic residue*. This 

Wur*h*tf * me *" surfke P*«** contain U«k 
^v.mflar mfe^anori. Hoot, Bashford j„ ^ 




*fer o..ter.4ttes, it enav ut^^^Tl^ 

"g* ^ thc st «^ ^ a whole as m-obabH, tDkratl ^ , 
extern, n,«fe- of hydrophobic side S 7 * 

Mmmmtm^ M J^Mte Roks from 
Sets m $m$mmm 

' xf fors, i protein interne -is a munhrr «f •< iW! ■ < 
• nm WH-t- .. - V '. !1, - JKJ or iS fsm «J' or related 

rX£Z * - ■ >f ***** bv g^Sc 

> p^y>ogmer.ic ttK$rtc«3s? Residue rrf*Wt* *w ' fe 

!«d Sto ^^ 

residue^ 3?r almDsr i^vio * ^ 5 iU 8^> pour 

iSKi - v So U ' ««te the ^cmic. In Ha 3 those 

mm Ac *2rf£ 1 ° bi!gate M^'*ic position 
-.^^ptijn, side aiasss de&x rhc * 

******* s Pcci& for Ac oadvLtr^ m'm L^fT 
binding r^«£ w 5 S f M ° 0 ' With ^ «^ *** 

. fo> wmpmag *e stability aod aetrnfe of a \L «f 
ffiatant sequences f^) However ^. a *** « 

fccwrasedihv^t^ f - w ^ the mutants wre 



At present, the mty reliable mctbocl for mmi-tim- . ^ 
sequence stelarin- » a k^ uW,' 3 (4eatdytng 

...... . • S! S«^ant se^tKoce stnsikrm- between dv&intk- 

rented ^rotrns. itou,e the number of fe^ ™n4 f I 
the number of «^ S^SRSit 
geous to increase the reaeb of rhe w^w, > - r 

In a norrna! hamafos,^ search die s»«,, i^t, F 

a stele tcsf seq« J? ^f 0 ^ diSi:abase •*» 

be weM^ ? C m ° re ^F'rtant than ofect* and s wold 

fccoroingiy. Moreover, certain rteoos of the 
Sf l" k ? *««* «»*an others. BAiiiSJC? 
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eest used to combine such infemsaioa into more approfowtdy 
weighted sequence asrdtes and alignments (in. These methods 
were ysed to align foe sequences of retroviral pmteases .with asparck 
pwtea«s, which m tarn allowed ecmsmiction of 3 foree-fomension. 
ai modei for th? pn>K3sc of human iisisusnode&iencv vims type I 
{•?*}. Comparison wsfo the rtaatdv &*fr«»noi ctvwal amtcture of 
this JwoMtt revealed reasonable agreement in many areas of the 
predicted smsaure {% ' 

The stmanraf in&msafom ..si most surface sites is highly degener- 
are. Except for fcr8ti«Js^^pe««9C residues, exterior .positions 
Mm » fee tmMnanr.AicSjjr in maintaining a rcasonafolv polar 
su« a «. The tmformatKS* contained m buried residues 'is abo 
qegeneratc, foe main retirement befog d»t these residues regain 
nydropfeobk. Thus, at its most hmk level, foe few arntsuraj 
rag* w an araajo acid sequence may reside in to specie pattern 
of nydropkfotc and byfoopbilk residues. This is meant in m 
isitomtm&ml seme, Gkariy, the pwdse structure and srahdnv of * 
protest depends on a I arg c number of detailed interactions' It k 
psmie, however, that structural prediction at a more primitive 
ievei can b« aecotnpiifoed fey eoncsntraring on the most basic 
snfonaatwoal aspects of m ^ m add sequence. For example 
m$a$mx patton* can he extracted from aligned sets of sequences 
and used, as «>;ne cases, to identify secondary structure*. 

L a region of secesjdary structure is packed against foe hvdropho- 
bKcorc, a p^^hydrophtfoie residues rcfiecting thcpe.riodxitv 
e*fo*secowdaryst^^ 34}. These partem can be 

obscured m tndmduai sequences by hydrophobic residues on the 
PTO surface. It k *«, however, for a surface posfedn » remain 
iiydrophobic over the coarse of evofcrion, Ckractfuendv* the am- 
pmpattec pmtenss e?tpmsd for simple secondary sttactures can be 
much clearer m a s« of related sequence* («)', m pnncipk is 
titen-ated in «g, 4, which shows helical hydrophobic moment plots 
tor foe Anrenmpcdia homeodomafo sequence iTis>, 4A; and for a 
composite ^ncs« derived from a set dfhba^ous homeodo- 
tnam prorefo* (Fig. 4S) (55). Inc. hydrophobic moment is a simpk 
tneasurc of me degree of amphipathfe character of* sequence m a 
given sectmdary structure The ampbipafoie character of foe 
^vftetKfrm in itseAntcnnapedia protein (56) is dearly 
reveafcdofoy by the analysis of foe cotnbined set of homeodoinain 
JJjucnees. Ifee secondary structure of Arc repressor, a small DMA- 
binding protein, was .recently predicted by a similar method (8) and 
conferraed by midcar magnetic Ksonance sntdics (57), 

The specific pattern of hydrophobic and bvdrophilie residues m 
m f" 880 •«»*S8«*e must limit foe nuniber of different smictures 
*» 5WB1 s ^««ce can adopt and may indeed define its overall fold if 
* f *' «hm ^ arrangement of bydxophobk and hydmphilic 
resKtoes shonid be a ebaraaeristk feature of a particular foid. Swwt 
ami toetsberg haw shown foat foe correlation of the pattern of 
hydn.phob.citv bero^n two pm^, .sequences is a good criterion 
tor foe.^mtctural tcixsdnm {M}. & addition, several studies 
^dote that patterns of obligatory Imitophohic positions identified 
from aligned: sequences are fosfojetive features of sequences that 
adopt foe same sfmcture (4, 29, M, 39). Thus, foe ofoer of 
fo*opi5o»K atxi feysteepSuSe residues in a sequence n>ay acnsallv be 
sutent saiormation to determine the bask folding partem of a 
protein setjaence. £> 1 

^Aifoot^h foe pattern of sequence hydtophobkitv mar be a 
cbatactensacfeantfe of a paascnlar foid, kis not vet efcar kiw such 
partema could be used for prediction of atrocrn^ de novo It ® 
tmpottant to understand how patterns in sequence space can fee 
mated m structsacs fo confonmabon space, Latt and Dig fe^ 
appn^bed fok pmbfem by studying 'foe propetdes of simple 
scquenc^ coaiposed only ol v H {hydro^obk'i and P (polar) g»L 
on mMhum*ad-h«k*» m . te emtipk of such a feprScmt 



«» is shown ,m big. S, Reside adtaccm m foe sequence muss 
occupy advent ^ates on the larbce, mo tSaes ci^ 
occupy foe same space. Free «^ ^p*mcul»r conformatfons are 
evabated mth a stngk term, an mmtmr of H -mmtos' Bv 

^^ 8 ) f^ rftC tf^*^ ^ «J«fom4tioni 
searO! tor ail 10^4 possible seonences of H *tsd F residues was 
possible. For longer settees only a rep re^n« fractbn of foe 
ailo^d SKjucaec or cotifomiation spce eoold be espiored Th- 
sigmScant rcsuks were as follows: (j) 5KfE ^ s^aen^Xt foH feZ 
a native ' stnicmte and only a few sequences form a unique asnC 
stmctare; (a) the probability fo a£ s sequent wifoadnpt a tioioue 
native stwWK increases with chafe length; and {ib; rfu- „ 3J |y 
states are compact, contain a hydfophobfc o»e snrrounded fev noW 
res«te, and contain significant secondary snncmre. Ajfe^h foe 
gap between foese Ewo-dtesional simulations md foree-limeq 
^smjcmres k large, the use of simp } e m!es m 
reprcsentatTOS yields results similar to those expected for real 
proteins, rfoee-dimenriooal lattice methods are also feeginfonsi to 
be develtipcd and evaiuatsd {«}. "* 



Ibcre : s more information fe a set of reiated seqnenees foan in a 
single science, A aumber of practical applications arise ffom an 
analysis oi the tolerance of ittsidne posibosis to chasige. First, such 
ffitormatton pemiiss foe evaluation of a rcs:dites importance to foe 
function and stabilfo of a ptotein. mk ability to tdenbfv the 
essettaaS elements of a pmtefo mqmmc may imprwe mir j^der. 
standing of the detenmn ants of protein folding and stability as well 
as protein function. Second., patterns of totec to amino add 
subscttuonns or varying hydr«pfoiicit>-- can help ks idcnbfV residue^ 
t0 fe kif!cd lo a protein structure and those likely to occupy 



rig. 4. Hc-ic;:! hydm- 
phobic nioineijK caku 
iai«i tf usmg (A) die 
A«K«ftape«& ho««r<i,}(.> 
tRaift 5ecjiie»K.-c or (Bj r 
set of 39 aiigfied h<S««6^ 
domain ;ucs (JS). 
The bars hjiiseate the cx- 
tets of Uw hefkaS ie- 
gions ideofified sn v.»de 
ar msg.'iesk resonance 
studios of foivAjjteisna.- 
pedis tawoifomain 
(M[; Tci doermine fjy- 
diopjjobk swmersts, 
reskise.< w«e assigrseti 
to one of three groups: 
Hi (high fodrophohki- 
t}' * Trp, &, Phsr, lets, 
M«, VaS, os- Cys); Hi 
:'iiwir-ajr. hvdfophohic- 
ity Tyr , Pkj, ,A3a, Thr, 
iin. A.srs, Gh, Aip, i,ys. 




10 20 30 *0 



His, G!y, « Scr); md H3 (low hydtophobicity » 

orj%5. fer the ^ Ik^«Ss« i^S^I'S 
l^^^^w^d by mesr IwdropfoWdtv hv usi«g the scale of Ktucherc 
•"^^{^. Aif wd Lys were nor eoa^ unte» no orfe re^ue Was 
^«ff P<^»f , hec S!)sf they eonwin fc^ slipbaricside chafes ^ ^ 

P^Hnfc seqiio** errors and »re eweptions, foe moss hvdwphiik residue 

^J^ hyd,^^^ was dien cteseti tn reprint the fodmt>i,o 

fl^f r*a«a% wers- 100 , Ibe vector m^iimdes were ajsig.^ a vaiiK ^ 

aRTict&s tjos 
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Bg< S, A HftwenrarioK of me cent- 
pm cmi&mtmio& for a pajtsatkr 

Ewo-dwwsjsioasl square iictice, 
(Adapted fiom (#), with permit 
ma of the jfearnctt Choakai Sou* 
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^e^^p ^pfe^ paa^ that can be used 

p>Ong t tmmkp ot aMowed S «k«tudom can improve the 

aMakjp dtewlj, ,rtte4 pmtcjas lxcause ^ cssenr ^ 
JS 7 ae * CTO te P^anesce in the alignment scoring 
AtJMKttgm ^ determined, *J*6w*faa«^ Sifceiy 

^ k & meOTbcr of a * 

SKthodS; to getra lists of allowed «m&> acid subsdtmkw 
g**M*$* P«ta te tndi^ pmasn sequences. Instead, 

«g sequence space through the kkatfemn offer residues and by 
«2^«Bte«^,^ as to sfetea'ce methods, it wig be 
posMbte toawctop agontfcns loffaMt .* limited number of trial 

**** tx F^K»ts and mm m^bmkmd caladariom ' 
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